Identifying Personal Narratives in Chinese Weblog Posts

نویسندگان

  • Andrew S. Gordon
  • Luwen Huangfu
  • Kenji Sagae
  • Wenji Mao
  • Wen Chen
چکیده

Automated text classification technologies have enabled researchers to amass enormous collections of personal narratives posted to English-language weblogs. In this paper, we explore analogous approaches to identify personal narratives in Chinese weblog posts as a precursor to the future empirical studies of cross-cultural differences in narrative structure. We describe the collection of over half a million posts from a popular Chinese weblog hosting service, and the manual annotation of story and nonstory content in sampled posts. Using supervised machine learning methods, we developed an automated text classifier for personal narratives in Chinese posts, achieving classification accuracy comparable to previous work in English. Using this classifier, we automatically identify over sixty-four thousand personal narratives for use in future cross-cultural analyses and Chinese-language applications of narrative corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identifying Personal Narratives in Chinese Weblog PostsTitleIdentifying Personal Narratives in Chinese Weblog Posts

Automated text classification technologies have enabled researchers to amass enormous collections of personal narratives posted to English-language weblogs. In this paper, we explore analogous approaches to identify personal narratives in Chinese weblog posts as a precursor to the future empirical studies of cross-cultural differences in narrative structure. We describe the collection of over h...

متن کامل

A Data-Driven Approach for Classification of Subjectivity in Personal Narratives

Personal narratives typically involve a narrator who participates in a sequence of events in the past. The narrator is therefore present at two narrative levels: (1) the extradiegetic level, where the act of narration takes place, with the narrator addressing an audience directly; and (2) the diegetic level, where the events in the story take place, with the narrator as a participant (usually t...

متن کامل

Adaptive Weblog Post Filtering Based on User Browsing History

One of the most important Web-based services that established the foundations of the Web 2.0 is the weblog. Weblogs are evolving to be topic based systems that can lead to more revenue for companies. Therefore many companies provide free weblog hosting. Weblog popularity is an effective factor to gain more revenue. Weblogs have posts and topics that are arranged chronologically with the most re...

متن کامل

Leave a Reply: An Analysis of Weblog Comments

Access to weblogs, both through commercial services and in academic studies, is usually limited to the content of the weblog posts. This overlooks an important aspect distinguishing weblogs from other web pages: the ability of weblog readers to respond to posts directly, by posting comments. In this paper we present a large-scale study of weblog comments and their relation to the posts. Using a...

متن کامل

Distinguishing Affective States in Weblog Posts

This short paper reports on initial experiments on the use of binary classifiers to distinguish affective states in weblog posts. Using a corpus of English weblog posts, annotated for mood by their authors, we trained support vector machine binary classifiers, and show that a typology of affective states proposed by Scherer’s et al is a good starting point for more

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013